Add fio, fio_scale, hammerdb_scale, sysbench workloads to summary report#1240
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe summary report JupyterLab tooling is extended to support FIO, FIO scale, Sysbench, and HammerDB scale workloads. New metric constants, geometric-mean update methods, aggregation paths, and widget filtering/display logic are added. The notebook cells and workload metadata OS version fallbacks are updated. ChangesSummary Report Workload Expansion
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py (1)
239-255:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
result_dfbefore.locrewrites when all workloads are empty.If every workload returns empty,
result_dfstaysNoneand Line 251 crashes on.loc.🛠️ Suggested fix
- result_df = None + result_df = pd.DataFrame() for workload in workloads: median_geometric_mean_df = self.analyze_workload(workload=workload) if median_geometric_mean_df.empty: continue @@ result_df = pd.concat([result_df, median_geometric_mean_df], ignore_index=True) + + if result_df.empty: + return result_df + # change workloads names result_df.loc[result_df['workload'] == 'hammerdb_lso', 'workload'] = 'hammerdb'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py` around lines 239 - 255, The code initializes result_df as None and only concatenates to it when a workload's median_geometric_mean_df is not empty. If all workloads are empty, result_df remains None, causing the subsequent result_df.loc operations (used for workload name rewrites) to crash. Add a guard condition to check that result_df is not None before executing the workload name rewriting section that uses result_df.loc on lines 251-255.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py`:
- Around line 217-226: The update_sysbench_data method conditionally updates
geometric_mean_data_cpu and geometric_mean_data_memory only when specific
columns have non-NA values, causing metadata and list length misalignment when
partial data exists for different UUIDs. Guard the metric accumulation by
updating the shared metadata in both CPU and memory dictionaries only when their
respective geometric mean values are being appended. Additionally, check for
column presence before performing sysbench dispatch updates. Apply the same
defensive guard pattern at the consolidated sites in the file (around lines
364-367 for one location and 433-435 for another).
- Around line 208-209: The logger.info() calls are passing arguments without
format string placeholders, which causes Python's logging module to raise a
TypeError at runtime. Fix these calls by adding format specifiers (%s) to the
message strings. For example, change logger.info('geo_mean_latency',
geo_mean_latency) to logger.info('geo_mean_latency=%s', geo_mean_latency) and
logger.info('geo_mean_iops', geo_mean_iops) to logger.info('geo_mean_iops=%s',
geo_mean_iops). Apply the same pattern to all affected logger.info() calls
throughout the file where extra arguments are passed without corresponding
format placeholders in the message string (also check
logger.info('geometric_mean_df', ...) and any similar calls).
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py`:
- Around line 169-174: The broad except Exception handler in the
get_workload_data call is masking auth failures, network errors, and other
operational issues as if they were missing index errors. Modify the exception
handling to check if the caught exception is specifically an index-not-found
error (by checking if 'index_not_found' appears in the exception type name). If
it is an index-not-found error, log a warning and return an empty DataFrame as
currently done. For all other exceptions, log the error at error level (not
warning) with the workload context and the actual error details, then re-raise
the exception so it doesn't silently fail and produce misleading reports.
---
Outside diff comments:
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py`:
- Around line 239-255: The code initializes result_df as None and only
concatenates to it when a workload's median_geometric_mean_df is not empty. If
all workloads are empty, result_df remains None, causing the subsequent
result_df.loc operations (used for workload name rewrites) to crash. Add a guard
condition to check that result_df is not None before executing the workload name
rewriting section that uses result_df.loc on lines 251-255.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 69e98ef1-e956-4fcb-857c-ab676ba5e66c
📒 Files selected for processing (4)
benchmark_runner/jupyterlab/templates/summary_report/summary_report.ipynbbenchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.pybenchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.pybenchmark_runner/workloads/workloads_operations.py
| geo_mean_latency = self.__geometric_mean(average_latency_df['lat_avg_usec']) | ||
| geo_mean_iops = self.__geometric_mean(average_iops_df['total_iops']) | ||
|
|
||
| logger.info('geo_mean_latency', geo_mean_latency) |
There was a problem hiding this comment.
Positional arg not formatted — causes TypeError at runtime. Should be logger.info('geo_mean_latency %s', geo_mean_latency)
There was a problem hiding this comment.
[Claude Code] Already fixed in commit 3232f82 — changed to logger.info('geo_mean_latency=%s', geo_mean_latency) format.
| result_df = pd.concat([result_df, median_geometric_mean_df], ignore_index=True) | ||
| # change workloads names | ||
| result_df.loc[result_df['workload'] == 'hammerdb_lso', 'workload'] = 'hammerdb' | ||
| result_df.loc[result_df['workload'] == 'hammerdb_scale', 'workload'] = 'hammerdb_scale(6 vms)' |
There was a problem hiding this comment.
result_df is None if all workloads return empty — result_df.loc will raise AttributeError. Needs a guard above: if result_df is None: return pd.DataFrame()
There was a problem hiding this comment.
[Claude Code] Fixed — added if result_df is None: return pd.DataFrame() guard before result_df.loc access.
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
♻️ Duplicate comments (4)
benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py (2)
208-209:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix logger.info() calls: add format string placeholders to avoid logging TypeErrors.
Lines 208–209 pass extra arguments without format placeholders, causing Python's logging module to raise
TypeError: not all arguments converted during string formattingat runtime.🛠️ Suggested fix
- logger.info('geo_mean_latency', geo_mean_latency) - logger.info('geo_mean_iops', geo_mean_iops) + logger.info('geo_mean_latency=%s', geo_mean_latency) + logger.info('geo_mean_iops=%s', geo_mean_iops)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py` around lines 208 - 209, The logger.info() calls at lines 208-209 are passing extra arguments (geo_mean_latency and geo_mean_iops) without corresponding format placeholders in the format string, which causes Python's logging module to raise a TypeError. Fix these calls by adding format string placeholders (such as %s) in the first argument string for each value being logged, so the format string and arguments align properly with Python's logging expectations.Source: Linters/SAST tools
217-226:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard sysbench metric accumulation for partial data to prevent aggregation crashes.
Current logic copies shared run metadata into CPU/memory dicts conditionally. If only one metric is present for some UUIDs, list lengths diverge and later DataFrame construction can fail.
🛠️ Suggested fix
def update_sysbench_data(self, subset: pd.DataFrame, unique_uuid: str): - self.__update_geometric_mean_data(subset, unique_uuid) - if subset['cpu_events_avg'].notna().all(): - geo_mean_cpu = self.__geometric_mean(subset['cpu_events_avg'].tolist()) - self.geometric_mean_data_cpu.update(self.geometric_mean_data) - self.geometric_mean_data_cpu['geometric_mean'].append(geo_mean_cpu) - if subset['memory_throughput_mib'].notna().all(): - geo_mean_memory = self.__geometric_mean(subset['memory_throughput_mib'].tolist()) - self.geometric_mean_data_memory.update(self.geometric_mean_data) - self.geometric_mean_data_memory['geometric_mean'].append(geo_mean_memory) + def _append_metric(target: dict, values: list): + target['uuid'].append(unique_uuid) + target['ocp_version'].append(subset['ocp_version'].iloc[0]) + target['cnv_version'].append(subset['cnv_version'].iloc[0]) + target['odf_version'].append(subset['odf_version'].iloc[0]) + target['vm_os_version'].append(subset['vm_os_version'].iloc[0]) + target['geometric_mean'].append(self.__geometric_mean(values)) + + if 'cpu_events_avg' in subset.columns: + cpu_values = subset['cpu_events_avg'].dropna().tolist() + if cpu_values: + _append_metric(self.geometric_mean_data_cpu, cpu_values) + if 'memory_throughput_mib' in subset.columns: + mem_values = subset['memory_throughput_mib'].dropna().tolist() + if mem_values: + _append_metric(self.geometric_mean_data_memory, mem_values)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py` around lines 217 - 226, The update_sysbench_data method conditionally copies shared metadata via the .update() call inside separate if blocks for CPU and memory metrics. This causes list length mismatches when only one metric is present for a UUID—the metadata gets added multiple times for some UUIDs but not others. To fix this, ensure shared run metadata from self.geometric_mean_data is only added once per UUID by moving the .update() calls outside and before the conditional blocks checking for cpu_events_avg and memory_throughput_mib, so the metadata is consistently added regardless of which individual metrics are present. The conditional .append() calls should remain inside their respective if blocks to only add computed values when data exists.benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py (2)
239-255:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard against None result_df when all workloads return empty.
If all workloads return empty DataFrames,
result_dfremainsNoneand line 252 (result_df.loc[...]) will raiseAttributeError: 'NoneType' object has no attribute 'loc'.🛠️ Suggested fix
result_df = pd.concat([result_df, median_geometric_mean_df], ignore_index=True) + if result_df is None or result_df.empty: + return pd.DataFrame() # change workloads names result_df.loc[result_df['workload'] == 'hammerdb_lso', 'workload'] = 'hammerdb'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py` around lines 239 - 255, The code initializes result_df as None and only populates it if at least one workload returns a non-empty DataFrame. If all workloads return empty DataFrames, result_df remains None, causing an AttributeError when the subsequent lines attempt to access result_df.loc for workload name transformations. Add a guard condition after the loop that checks whether result_df is None or empty before executing the workload name replacement operations (the result_df.loc[...] assignments that change workload names like 'hammerdb_lso' to 'hammerdb', 'hammerdb_scale' to 'hammerdb_scale(6 vms)', etc.), ensuring these operations only run when result_df contains valid data.
169-174:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCatch specific Elasticsearch exceptions instead of masking all errors as missing index.
The broad
except Exceptionhandler silently converts auth failures, network errors, and query problems into empty DataFrames, which masks operational issues and produces misleading reports. Index not found is a specific, expected error; others should be re-raised or logged as errors.Consider catching specific exceptions from the elasticsearch library or checking if the caught exception is specifically an index-not-found error:
try: entries = self.summary_report.get_workload_data(index=f'{workload}-results', start_datetime=self.DEFAULT_START_DATE, end_datetime=self.DEFAULT_END_DATE) except Exception as e: # Only treat missing index as a recoverable condition if 'index_not_found' in str(type(e).__name__).lower(): logger.warning(f'No index found for workload: {workload}') return pd.DataFrame() # Propagate other errors logger.error(f'Failed to fetch workload data for {workload}: {e}') raise🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py` around lines 169 - 174, The exception handling in the try-except block around the get_workload_data method call is too broad and masks authentication failures and network errors by treating all exceptions as missing index conditions. Instead of catching all exceptions with a single warning log, you need to distinguish between index-not-found errors (which are expected and recoverable) and other errors (which indicate operational problems). Modify the except block to check if the caught exception is specifically an index-not-found error by examining the exception type or message, and only return an empty DataFrame with a warning if that specific condition is true. For all other exceptions, log them as errors and re-raise them so they propagate up and don't silently produce misleading empty DataFrames.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py`:
- Around line 208-209: The logger.info() calls at lines 208-209 are passing
extra arguments (geo_mean_latency and geo_mean_iops) without corresponding
format placeholders in the format string, which causes Python's logging module
to raise a TypeError. Fix these calls by adding format string placeholders (such
as %s) in the first argument string for each value being logged, so the format
string and arguments align properly with Python's logging expectations.
- Around line 217-226: The update_sysbench_data method conditionally copies
shared metadata via the .update() call inside separate if blocks for CPU and
memory metrics. This causes list length mismatches when only one metric is
present for a UUID—the metadata gets added multiple times for some UUIDs but not
others. To fix this, ensure shared run metadata from self.geometric_mean_data is
only added once per UUID by moving the .update() calls outside and before the
conditional blocks checking for cpu_events_avg and memory_throughput_mib, so the
metadata is consistently added regardless of which individual metrics are
present. The conditional .append() calls should remain inside their respective
if blocks to only add computed values when data exists.
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py`:
- Around line 239-255: The code initializes result_df as None and only populates
it if at least one workload returns a non-empty DataFrame. If all workloads
return empty DataFrames, result_df remains None, causing an AttributeError when
the subsequent lines attempt to access result_df.loc for workload name
transformations. Add a guard condition after the loop that checks whether
result_df is None or empty before executing the workload name replacement
operations (the result_df.loc[...] assignments that change workload names like
'hammerdb_lso' to 'hammerdb', 'hammerdb_scale' to 'hammerdb_scale(6 vms)',
etc.), ensuring these operations only run when result_df contains valid data.
- Around line 169-174: The exception handling in the try-except block around the
get_workload_data method call is too broad and masks authentication failures and
network errors by treating all exceptions as missing index conditions. Instead
of catching all exceptions with a single warning log, you need to distinguish
between index-not-found errors (which are expected and recoverable) and other
errors (which indicate operational problems). Modify the except block to check
if the caught exception is specifically an index-not-found error by examining
the exception type or message, and only return an empty DataFrame with a warning
if that specific condition is true. For all other exceptions, log them as errors
and re-raise them so they propagate up and don't silently produce misleading
empty DataFrames.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9e42aead-2bb9-49b1-b7e4-16aa5dbe7e04
📒 Files selected for processing (4)
benchmark_runner/jupyterlab/templates/summary_report/summary_report.ipynbbenchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.pybenchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.pybenchmark_runner/workloads/workloads_operations.py
|
@coderabbitai full review |
✅ Action performedFull review finished. |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py (1)
217-230:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix sysbench accumulator corruption with partial metric availability (Line 222).
self.__update_geometric_mean_data(...)runs before deciding which metric bucket to append, so UUID metadata can be added for memory-only runs and later copied into CPU lists (or vice versa). That creates mismatched list lengths and can break DataFrame construction in aggregation.Suggested fix
`@typechecked`() def update_sysbench_data(self, subset: pd.DataFrame, unique_uuid: str): - has_cpu = 'cpu_events_avg' in subset.columns and subset['cpu_events_avg'].notna().all() - has_memory = 'memory_throughput_mib' in subset.columns and subset['memory_throughput_mib'].notna().all() - if not has_cpu and not has_memory: - return - self.__update_geometric_mean_data(subset, unique_uuid) - if has_cpu: - geo_mean_cpu = self.__geometric_mean(subset['cpu_events_avg'].tolist()) - self.geometric_mean_data_cpu.update(self.geometric_mean_data) - self.geometric_mean_data_cpu['geometric_mean'].append(geo_mean_cpu) - if has_memory: - geo_mean_memory = self.__geometric_mean(subset['memory_throughput_mib'].tolist()) - self.geometric_mean_data_memory.update(self.geometric_mean_data) - self.geometric_mean_data_memory['geometric_mean'].append(geo_mean_memory) + has_cpu = 'cpu_events_avg' in subset.columns and subset['cpu_events_avg'].notna().all() + has_memory = 'memory_throughput_mib' in subset.columns and subset['memory_throughput_mib'].notna().all() + if not has_cpu and not has_memory: + return + + def _append_metric(target: dict, metric_values: list): + target['uuid'].append(unique_uuid) + target['ocp_version'].append(subset['ocp_version'].iloc[0]) + target['cnv_version'].append(subset['cnv_version'].iloc[0]) + target['odf_version'].append(subset['odf_version'].iloc[0]) + target['vm_os_version'].append(subset['vm_os_version'].iloc[0]) + target['geometric_mean'].append(self.__geometric_mean(metric_values)) + + if has_cpu: + _append_metric(self.geometric_mean_data_cpu, subset['cpu_events_avg'].tolist()) + if has_memory: + _append_metric(self.geometric_mean_data_memory, subset['memory_throughput_mib'].tolist())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py` around lines 217 - 230, The issue is that update_sysbench_data calls self.__update_geometric_mean_data(subset, unique_uuid) unconditionally before deciding which metric bucket (CPU or memory) to append the data to, causing UUID metadata to be added to geometric_mean_data and then incorrectly copied to both geometric_mean_data_cpu and geometric_mean_data_memory lists regardless of which metrics are actually available. Fix this by moving the self.__update_geometric_mean_data(subset, unique_uuid) call to only execute when it's actually needed: call it inside the if has_cpu block before updating geometric_mean_data_cpu, and/or inside the if has_memory block before updating geometric_mean_data_memory, so metadata is only added to the lists that will actually receive data.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py`:
- Around line 431-433: The conditional check in the workload in ['fio',
'fio_scale'] branch directly accesses subset['total_iops'] and
subset['lat_avg_usec'] columns without first verifying these columns exist in
the dataframe. Add column existence checks before calling .notna() on these
columns to ensure both 'total_iops' and 'lat_avg_usec' are present in subset. If
either column is missing, the condition should fail gracefully without raising a
KeyError, allowing the code to skip to the next iteration.
In `@benchmark_runner/workloads/workloads_operations.py`:
- Line 447: The __get_metadata method has inconsistent handling of
self._product_versions. Line 447 properly guards the access with an isinstance
check to ensure self._product_versions is a dict before calling .get(), but
subsequent accesses at lines 481, 483, and 485 call .get() directly without this
guard. Since self._product_versions defaults to an empty string in __init__,
these unguarded calls will raise AttributeError. Apply the same isinstance check
pattern to all direct calls to self._product_versions.get() in the
__get_metadata method to ensure consistent and safe handling, or normalize
self._product_versions to a dict at the beginning of the method if it's not
already one.
---
Duplicate comments:
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py`:
- Around line 217-230: The issue is that update_sysbench_data calls
self.__update_geometric_mean_data(subset, unique_uuid) unconditionally before
deciding which metric bucket (CPU or memory) to append the data to, causing UUID
metadata to be added to geometric_mean_data and then incorrectly copied to both
geometric_mean_data_cpu and geometric_mean_data_memory lists regardless of which
metrics are actually available. Fix this by moving the
self.__update_geometric_mean_data(subset, unique_uuid) call to only execute when
it's actually needed: call it inside the if has_cpu block before updating
geometric_mean_data_cpu, and/or inside the if has_memory block before updating
geometric_mean_data_memory, so metadata is only added to the lists that will
actually receive data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5c3e0f62-a708-4f19-95c0-b12024623df6
📒 Files selected for processing (4)
benchmark_runner/jupyterlab/templates/summary_report/summary_report.ipynbbenchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.pybenchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.pybenchmark_runner/workloads/workloads_operations.py
8fe63c8 to
392fc91
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py (1)
431-433:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard FIO column existence before
.notna()checks.This branch accesses
subset['total_iops']andsubset['lat_avg_usec']directly without first verifying the columns exist. If either column is missing in an index/document variant, this raisesKeyErrorinstead of skipping gracefully.Compare to
update_sysbench_data(line 218) which defensively checks'cpu_events_avg' in subset.columnsbefore accessing.Suggested fix
elif workload in ['fio', 'fio_scale']: - if not subset.empty and subset['total_iops'].notna().all() and subset['lat_avg_usec'].notna().all(): + if ( + not subset.empty + and 'total_iops' in subset.columns + and 'lat_avg_usec' in subset.columns + and subset['total_iops'].notna().all() + and subset['lat_avg_usec'].notna().all() + ): self.update_fio_data(subset, unique_uuid)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py` around lines 431 - 433, The condition checking `subset['total_iops'].notna().all()` and `subset['lat_avg_usec'].notna().all()` in the FIO workload branch will raise a KeyError if either column is missing from the dataframe, instead of gracefully skipping. Add defensive column existence checks using `in subset.columns` for both 'total_iops' and 'lat_avg_usec' before attempting to access them with `.notna()`, following the same pattern used in the update_sysbench_data method which defensively checks for 'cpu_events_avg' in subset.columns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py`:
- Line 230: The rename operation on the median_geometric_mean_df in the rename()
call is renaming the 'result' column to a tuple value (the return of
self.get_two_last_major_versions()), but the downstream
upload_report_to_elasticsearch() function expects a column named 'Diff %' to
read the difference percentage. This mismatch causes the column lookup to fail,
defaulting diff_pct to 0 and incorrectly marking changes as stable. Fix this by
renaming the 'result' column to 'Diff %' instead of using the tuple from
self.get_two_last_major_versions() to maintain the column contract expected by
upload_report_to_elasticsearch().
---
Duplicate comments:
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.py`:
- Around line 431-433: The condition checking
`subset['total_iops'].notna().all()` and `subset['lat_avg_usec'].notna().all()`
in the FIO workload branch will raise a KeyError if either column is missing
from the dataframe, instead of gracefully skipping. Add defensive column
existence checks using `in subset.columns` for both 'total_iops' and
'lat_avg_usec' before attempting to access them with `.notna()`, following the
same pattern used in the update_sysbench_data method which defensively checks
for 'cpu_events_avg' in subset.columns.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ab3587cc-236a-4939-b0f0-a24a61328c19
📒 Files selected for processing (4)
benchmark_runner/jupyterlab/templates/summary_report/summary_report.ipynbbenchmark_runner/jupyterlab/templates/summary_report/summary_report_operations.pybenchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.pybenchmark_runner/workloads/workloads_operations.py
| logger.warning(f'No median geometric mean data for workload: {workload}') | ||
| return pd.DataFrame() | ||
| v1, v2 = self.get_two_last_major_versions() | ||
| median_geometric_mean_df.rename(columns={'previous_val': v1, 'geometric_mean': v2, 'result': self.get_two_last_major_versions()}, inplace=True) |
There was a problem hiding this comment.
Line 230 breaks the diff-column contract used by Elasticsearch upload.
Line 230 renames result to a tuple (self.get_two_last_major_versions()), but upload_report_to_elasticsearch() reads record.get('Diff %', 0) (Lines 391-392). That makes uploaded diff_pct default to 0 and marks statuses as stable, even when there is a real change.
Suggested fix
- median_geometric_mean_df.rename(columns={'previous_val': v1, 'geometric_mean': v2, 'result': self.get_two_last_major_versions()}, inplace=True)
+ median_geometric_mean_df.rename(columns={'previous_val': v1, 'geometric_mean': v2, 'result': 'Diff %'}, inplace=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@benchmark_runner/jupyterlab/templates/summary_report/summary_report_widgets.py`
at line 230, The rename operation on the median_geometric_mean_df in the
rename() call is renaming the 'result' column to a tuple value (the return of
self.get_two_last_major_versions()), but the downstream
upload_report_to_elasticsearch() function expects a column named 'Diff %' to
read the difference percentage. This mismatch causes the column lookup to fail,
defaulting diff_pct to 0 and incorrectly marking changes as stable. Fix this by
renaming the 'result' column to 'Diff %' instead of using the tuple from
self.get_two_last_major_versions() to maintain the column contract expected by
upload_report_to_elasticsearch().
072e505 to
5528e9b
Compare
- Add fio, fio_scale, hammerdb_scale, sysbench to summary report - Dynamic OS versions for bootstorm (from product_versions config) - Product versions display section in notebook - Graceful error handling for missing ES indexes - Fix logger.info() format strings to avoid TypeError - Guard result_df None in analyze_all_workload - Guard sysbench partial data to prevent aggregation crash - Guard FIO column existence before .notna() checks - Use normalized product_versions consistently in __get_metadata Assisted-by: Claude Code
5528e9b to
eed14b0
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arpsharm, ebattat The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
Summary
BOOTSTORM_FEDORA/BOOTSTORM_WINDOWSconstants — bootstorm now readsvm_os_versiondynamically fromproduct_versionsconfig; defaultdb_vm_os_versionupdated fromcentos-stream9tocentos-stream10environment_variables.pyupdate_hammerdb_dataonly logs columns that exist in the data (fixeshammerdb_scaleKeyError)Changes
summary_report_operations.pyFIO_IOPS_METRIC,FIO_LATENCY_METRIC,SYSBENCH_CPU_METRIC,SYSBENCH_MEMORY_METRICconstantsBOOTSTORM_FEDORAandBOOTSTORM_WINDOWS— metric names set directly without OS matchingupdate_fio_data(),update_sysbench_data()methods for new workload data processingaggregate_fio_dataframe(),aggregate_sysbench_dataframe()methodsprepare_workload_dataframe(),calc_median_geometric_mean_df(),aggregate_workload_dataframe()to dispatch fio, fio_scale, hammerdb_scale, sysbenchcalc_median_geometric_mean_df()summary_report_widgets.pyfio,fio_scale,hammerdb_scale,sysbenchtoSTORAGE_TYPESget_filtered_df()(fio/fio_scale use scale field, hammerdb_scale uses scale=6)analyze_all_workload()fio_scale→fio_scale(6 vms),hammerdb_scale→hammerdb_scale(6 vms),vdbench_scale→vdbench_scale(6 vms)get_product_versions()— reads fromenvironment_variables.pyget_workload_filtered_data()for missing ES indexesanalyze_workload()andanalyze_all_workload()summary_report.ipynbworkloads_operations.pyvm_os_versionreads fromdb_vm_os_version(updated default tocentos-stream10)vm_os_versionreads dynamically fromproduct_versions['vm_os_version']instead of hardcodedfedora37Test plan
🤖 Assisted-by: Claude Code
Summary by CodeRabbit
Release Notes
New Features
Improvements
Bug Fixes